Missing data and the preprocessing perceptron
نویسنده
چکیده
In this paper, several ways to handle missing data, e.g. removing cases, mean imputation, and multiple imputation, are described and discussed. The Pima-Indians-Diabetes data set is used as a case study. This particular data set is interesting to use since it has not been obvious to all users that it actually contains a substantial amount of missing data. The data set is described in detail and the methods for coping with missing data mentioned in the text is applied on the data set. The preprocessing perceptron is used to train decision support systems on the data sets. A sketch of a way to impute missing data using the preprocessing perceptron is also proposed and discussed. The accuracy of the trained decision support systems, at the optimal efficiency point, lied in the interval 76-82% for the different methods. The highest values were obtained when all missing data cases were removed both from the test and the training set. This is, however, not a good way to handle missing data since the resulting decision support system is biased. Furthermore it will not be able to handle missing data when used on real data in the future. The results of the remaining methods were surprisingly similar, a reason for this might be that the data set used is rather large. Differences between methods would probably be larger in a smaller data set with larger amount of missing data.
منابع مشابه
Functional preprocessing for multilayer perceptrons
In many applications, high dimensional input data can be considered as sampled functions. We show in this paper how to use this prior knowledge to implement functional preprocessings that allow to consistently reduce the dimension of the data even when they have missing values. Preprocessed functions are then handled by a numerical MLP which approximates the theoretical functional MLP. A succes...
متن کاملPreprocessing Perceptrons
Reliable results are crucial when working with medical decision support systems. A decision support system should be reliable but also be interpretable, i.e. able to show how it has inferred its conclusions. In this thesis, the preprocessing perceptron is presented as a simple but effective and efficient analysis method to consider when creating medical decision support systems. The preprocessi...
متن کاملProbabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملSolar Radiation Forecasting Using Ad-Hoc Time Series Preprocessing and Neural Networks
In this paper, we present an application of neural networks in the renewable energy domain. We have developed a methodology for the daily prediction of global solar radiation on a horizontal surface. We use an ad-hoc time series preprocessing and a Multi-Layer Perceptron (MLP) in order to predict solar radiation at daily horizon. First results are promising with nRMSE < 21% and RMSE < 998 Wh/m2...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004